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1. INTRODUCTION 

Breast cancer is one of the common types of cancer causes death. About half a million people 
especially women die every year around the world. According to WHO in 2018, it is approximately 15% of 
all cancer deaths among women [1]. Cancer cell will grow and spread on breast tissue such as in the duct that 
brings milk to nipple, some in lobular which that makes breast milk, and some in other support tissue in 
breast [2]. Treatment will be different in every patient according to the status of the classes. Specialist 
doctors would determine the classes of the cancer. Detecting breast cancer often using machine learning 
techniques. Machine learning techniques provide time efficiency and a more accurate diagnosis to help 
doctors diagnose patients. There are some machine learning methods that used in classification of breast 
cancer such as normed kernel function-based fuzzy possibilistic C-means algorithm [3], sparse learning based 
fuzzy c-means [4], deep learning approach [5], combination of K-means, fuzzy C-means algorithm, and 
kernel function [6], convolutional neural network [7], using hybrid deep neural network [8], using SVM and 
hough transform [9]. In this research, a breast cancer dataset is performed by using linear discriminant 
analysis and support vector machines. Both methods have good performance for disease diagnosis and 
classification. 
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2. RESEARCH METHOD 
2.1. Dataset 

This research uses the wisconsin diagnostic breast cancer (WDBC) dataset from UCI machine 
learning repository [10]. This dataset has 32 attributes and 569 of instances without missing values. From the 
number of instances, 357 belongs to benign and 212 belongs to malignant. Benign and malignant are as a 
diagnosis in. All features are processed into numerical from a digitized image of a fine needle aspirate (FNA) 
of a breast mass and recoded with four significant digits. The dataset described characteristics of the cell 
nuclei present in the image. 


2.2. Linear discriminant analysis (LDA) 

Linear discriminant analysis (LDA) is one of discriminant analysis method which can be used in 
classification and dimension reduction [11-13]. The main purpose of linear discriminant analysis is to predict 
the best categorize for multi-class labels [14]. Apply following equation: 


Z = BX, + Box. ++ + Baxa (1) 


— BT u1-BT U2 
S(B) p BTCB (2) 


Refers to the score function 


_ Z4-Zp 
SB) ~ Z variance in the group (3) 

The score function is maximized by the estimation linear coefficients. It is calculated by the following 
formula. 


B= C™* (Uy — Hy) (4) 
C= ee mC +15C) (5) 


Where f£ menas the linear model coefficients, C means the covariance matrix and means the average 
vector. 
To calculate the best discriminant between the two groups, use the Mahalanobis equation: 


A?= BT (Uy — He) © 


BT (x - (2) > log eo (7) 


P(C2) 


The equations A represents the mahalanobis difference between the two groups, x represent data vector, and 
P represents class probabilities. 
For the final step, if the condition in equation is satisfied, a new feature is classified [15-16]. 


2.3. Support vector machines (SVM) 

Support vector machine is supervised machine learning technique for classification and regression 
problems which was proposed by Vapnik et al. in 1992. SVM is a computational algorithm that learns to 
assign labels to object from experience and examples. SVM can be applied to medical diagnosis [17-19] 
weather prediction, finance [20], stock market analysis [21-22] and image processing [23]. SVM has the 
fundamental feature of separating binary labeled data centered on a line that achieves the labeled data's 
maximum distance [24]. To help labeled data separate, SVM uses a hyperplane which divides plane into 
classes and measuring a maximum margin where in class lies on the either side. Given a dataset {x;, y,}f_, 
where x,is an element of R®, y; is the class label, where y; € {—1,1} for binary classification, and N is 
number of samples [25]. Since the goal of SVM is to find the best hyperplane, it follows: 


w.x+b=0 (8) 
The decision function can be expressed as: 
f@) = sign(w.x + b) (9) 
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From (9) here w = YL, ay, x; and b = —Y.0% —YamYnXm) 


3. RESULTS AND ANALYSIS 

This research used RStudio software for running the program of both methods which are support 
vector machines and linear discriminant analysis. From Figure 1, the result og linear discriminant analysis, 
the red graph is a group of samples diagnosed with benign breast cancer and the blue graph is for malignant. 
It can be said that the linear discriminant analysis successfully classifies base on the dataset. According to 
Tables 1-2, there are 355 of samples that has benign breast cancer correctly and 2 of healthy samples that 
were incorrectly identified breast cancer. Samples that has malignant correctly are 194 from Table 1 and 207 
from Table 2. By testing the accuracy, sensitivity, specificity and Fl-score with 80% of data training and 
20% of data testing. The result is in following table. 

From Table 3, support vector machines (SVM) has better performance than linear discriminant 
analysis (LDA) according to the percentage of the result. Accuracy measure how accurate of the model 
performance that perform the data. Accuracy from support vector machines is 98.77% it is representing the 
accurate of the model in support vector machines and its more accurate than linear discriminant analysis that 
has 96.49%. Sensitivity is the probability that patients with cancer are diagnosed with our model. Sensitivity 
is 99.44%, both of linear discriminant analysis and support vector machines are same. Specificity is the 
probability that patients without cancer are not diagnosed. Specificity from support vector machines is 
97.64% and from linear discriminant analysis is 91.51%. Fl-score measured the realistic accuracy the model 
performances. Fl-score from support vector machines is 99% and 97,26% for linear discriminant analysis. 


Table 1. Confusion matrix of linear 
discriminant analysis 
Prediction Reference 
B M 
B 355 18 
diagnosis 
=: M 2 194 
¥ | 
-y alpha 
Lo1 
Figure |. Linear discrimnant analysis classification result 
Table 2. Confusion matrix of support vector Table 3. Result from linear discriminant analysis and 
machines support vector machines 
Prediction Reference LDA SVM 
B M Accuracy (%) 96.49 98.77 
B 355 5 Sensitivity (%) 99.44 99.44 
M 2 207 Specificity (%) 91.51 97.64 
Fl-Score (%) 97.26 99 


4. CONCLUSION 

According to the result, both support vector machine and linear discriminant analysis has a good 
performance based on accuracy, sensitivity, specificity and Fl-score. By comparing two methods based on 
the number of results, it can be concluded that support vector machine better than linear discriminant 
analysis. Support vector machines has been widely used by researchers especially on breast cancer 
classification because it has a good performance. Support vector machines is suggested to help the doctor to 
predict and classify a disease or a dataset that similar. 
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